Potential benefits of using radioactive ion beams for range margin reduction in carbon ion therapy

Sharp dose gradients and high biological effectiveness make ions such as 12C an ideal tool to treat deep-seated tumors, however, at the same time, sensitive to errors in the range prediction. Tumor safety margins mitigate these uncertainties, but during the irradiation they lead to unavoidable damage to the surrounding healthy tissue. To fully exploit the Bragg peak benefits, a large effort is put into establishing precise range verification methods. Despite positron emission tomography being widely in use for this purpose in 12C therapy, the low count rates, biological washout, and broad activity distribution still limit its precision. Instead, radioactive beams used directly for treatment would yield an improved signal and a closer match with the dose fall-off, potentially enabling precise in vivo beam range monitoring. We have performed a treatment planning study to estimate the possible impact of the reduced range uncertainties, enabled by radioactive 11C ions treatments, on sparing critical organs in tumor proximity. Compared to 12C treatments, (i) annihilation maps for 11C ions can reflect sub- millimeter shifts in dose distributions in the patient, (ii) outcomes of treatment planning with 11C significantly improve and (iii) less severe toxicities for serial and parallel critical organs can be expected.


Scientific Reports
| (2022) 12:21792 | https://doi.org/10.1038/s41598-022-26290-z www.nature.com/scientificreports/ a clinical target volume (CTV) would in both cases lead to the irradiation of healthy tissue and possibly critical structures in the target proximity, potentially inducing toxicities. For serial organs at risk (OAR), a dose higher than the tolerance threshold to any of its subunits is sufficient to increase the probability of a severe side effect significantly. Thus, one would expect that reduced tumor margins, associated with range uncertainty, significantly improve the treatment related toxicity. In fact, the decrease of head and neck toxicities was already predicted following the in-silico range margin reduction study in proton patients 6 . With regard to parallel OAR, where the damage would be correlated with a certain volume or the fraction of the organ receiving a threshold dose, the impact of the margin reduction is also expected to play a positive role however the size of the effect would probably strongly depend on the volumes of the OAR and the tumor.
To fully exploit the benefits of a sharp Bragg peak, a large effort is put nowadays into investigating different range verification methods. Among them, positron emission tomography (PET) remains one of the most widely used approaches to monitor dose delivery in clinical practice by exploiting β + -emitting isotopes 7 . In 12 C ion therapy, the contributors to the PET signal are the radioactive fragments of the primary ions, mainly 11 C and 10 C. The activity peak is then observed upstream of the Bragg peak because these fragments have shorter ranges than the primary projectiles at the same velocity 8 . Unfortunately, the usefulness of PET in 12 C therapy remains marginal and insufficient for a complete range uncertainty elimination for several reasons. First, the half-life of the most abundantly produced radionuclides is too long for instantaneous feedback. Instead, the short-lived radionuclides are produced at a low rate and exhibit a long positron range before annihilation 9 . Second, the spatial shift in the β + -activity and dose peaks as well as the biological washout requires Monte Carlo (MC) simulations 10 to analyze the data. Uncertainties in those calculations together with the low counting rates of β + -emitting fragments currently limit the accuracy of the PET-based range verification method to about 2-5 mm 11,12 .
One could overcome the above-mentioned difficulties by using radioactive ion beams (RIBs), such as 11 C or 10 C for simultaneous treatment and online beam verification 13 . This way, compared to the irradiation with stable ions, all the primaries will potentially contribute to the PET signal, significantly increasing the signal-to-noise ratio 14 . Shorter half-lives of the RIBs (20.33 min and 19.3 s for 11 C and 10 C, respectively), would reduce the time of PET signal acquisition and make the impact of the biological washout less significant 15 . Finally, the activity peaks would match the mean ranges of the primaries and will be correlated with the R80 of the Bragg peaks (SOBP), i.e., the position in the distal fall-off where the dose drops to 80% of the SOBP value 16 .
Until now, the pre-clinical and clinical studies of RIBs were held back by the difficulties in producing RIBs at sufficiently high intensities. However, the interest in these studies has been recently revived following the new developments at modern accelerator facilities, such as the intensity upgrade of the SIS-18 accelerator at GSI (Darmstadt, Germany) in frames of the FAIR facility construction 17 . The work on RIBs characterization and clinical application has been resumed in frames of the BARB (Biomedical Applications of Radioactive ion Beams) project 18 funded by the European Research Council (ERC) in 2020. There have been already two successful experimental campaigns in 2021 aimed at the RIB characterization and first imaging tests with PET detectors, however, these are not part of this work and will be presented in separate contributions 19 .
In this study, instead, we attempt to demonstrate the rationale for using RIBs in clinical practice. First, we estimate the potential accuracy of dose visualization and its shift detection with RIB activity data. We conclude that if the measured annihilation map matches the predicted one, the same should be expected for the dose distribution and thus there will be no longer the need for the margins associated with range uncertainties to be applied. Following that, we perform an in-silico treatment planning study to quantitively estimate the potential benefit from 11 C treatment planning with reduced range margins. By retrospectively robustly re-optimizing patient plans for the head-and-neck and liver tumors with 12 C and 11 C beams, we estimate the impact of range margin reduction on serial (optic nerves) and parallel (liver) OAR on the level of treatment planning (i.e., meeting the dose constraints) and, furthermore, in terms of the predicted normal tissue complication probabilities (NTCP) 20 .

Results
RIB annihilation maps to detect range shifts. Figure 1 shows the results of a single-field dose optimization on a patient CT. The RBE-weighted dose distribution, calculated with TRiP98, is shown in Fig. 1a, while Figs. 1b and 1c show the resulting absorbed dose and time-integrated annihilation maps calculated with FLUKA. In Fig. 1d we include the profiles of the absorbed dose and annihilation distributions along the beam direction, yielded by the three raster files, produced using the default and scaled HLUT. Full distributions, normalized to the respective peak values, as well as the zoom-in of the target region are shown. The positions of the peaks of the annihilation distributions correlate with the positions of the fall-offs of the respective dose distributions, and the distances between them reflect the shifts in the dose distributions caused by either 3.5% or 1% differences in interpreting CTs in terms of the ion path lengths. Notably, the proximal rise of the annihilation distribution, even though not as steep as its distal fall-off, correlates with the location of the frontal edge of the SOBP. This all would ideally allow to spot the deviations in the dose distributions, associated with the range uncertainties, and correct them online.
Margin reduction with 11 C beams-impact on treatment plans. Following the outcomes of the preliminary study described in the previous section, we assumed here the possibility of precise range detection with PET for 11 C beams. If the measured annihilation distribution would match the estimated one, this would automatically mean that actual position of the spread-out Bragg peak also matches its prediction, eliminating the further need for using the range-associated margins during the robust plan optimization. With this assumption we performed the robust biological (RBE-weighted) optimization of adenoid cystic carcinoma (ACC) and liver plans setting the expected range uncertainty in the 11 C plans to 0%, while for the 12  www.nature.com/scientificreports/ dose calculation was done assuming the value of 3.5%. The setup uncertainty of ± 3 mm has kept included for both ions. An example of the field configuration and organ contours for the ACC plan optimization is shown in Fig. 2a. This information was preserved from the original plans used at GSI for patient treatment, where the nerves sometimes were bordering or overlapping with the CTVs. In these cases, since the priority was given to the target coverage (D95 > 95%), sometimes the nerve(s) sparing could have been compromised. Figure 2b summarizes the treatment planning outcomes for both nerves patient by patient, i.e. the percentage of treatment scenarios (out of a total of 21) when the treatment planning constraint (D 0.03cm3 > 2.46 Gy(RBE)) was violated to ensure the sufficient target coverage. The worst outcome would be 0% for both nerves, meaning that it is impossible to meet the planning constraint without compromising CTV coverage. Instead, the best outcome (100% for both nerves) would mean that sufficient target coverage will be achieved in all the treatment scenarios without exceeding the recommended doses for any nerve. The use of 11 C plans with reduced range margins led to a significant improvement in the treatment plan quality. For only 2 patients out of 15 (in contrast to 8 out of 15 in the case of 12 C), the maximal allowed dose was exceeded for both nerves in more than half of the analyzed scenarios. Figure 3a shows the example of the liver patient plan optimized for a uniform CTV (black contour) dose of 10 Gy(RBE). The liver is depicted with a brown contour. As in the ACC plans, the primary goal was achieving the prescribed target dose. The dose constraints set for the remaining healthy liver (as suggested by 21 ), D 700cm3 < 4.2 Gy(RBE) or 3.5 Gy(RBE) per fraction for 4-and 6-fraction schemes, respectively, were met for all the analyzed treatment plans.
Box plots in Fig. 3b-d show the distribution of these parameters' median values from all the patients (i.e., 10 values per group). The outlier points in these plots correspond to the parameters derived for patient 6 with the largest CTV (223 cm 3 ), and the smallest healthy liver volumes (927 cm 3 ). Results of the Wilcoxon Signed-Rank reveal a significant difference (P = 0.002) for the D mean , V5, and V30 values between 11 and 12 C groups suggesting that also for the liver the use of 11 C would be beneficial in terms of dose reduction. 1% for 11 C and 12 C plans, respectively; for the right nerve, these are 3.5% versus 7.6%. The highest worst-case scenario NTCP values observed for 12 C plans were 45.6% and 53.2% for the right and left nerves, respectively; in 11 C, these values for the same patients dropped to 2.9% and 20.4%, respectively (not depicted here). The worst outcome in terms of NTCP was predicted for the patients where either the nerve was overlapping with the target or 'wrapped' by the tumor. Results of the Wilcoxon Signed-Rank reveal a significant difference (P = 0.00006 for the right nerve and P = 0.0003 for the left nerve) between the predicted NTCP values for 11 C and 12 C plans, suggesting that 11 C plans with reduced range margins might be notably safer options compared to conventional 12 C plans. We have additionally calculated the relative risks (RR) as RR = NTCP 11C /NTCP 12C to quantify the relative NTCP difference between different ion plans for nominal and worst-case scenarios patient-by-patient. Figure 4b shows the RR values for both right (dashed bar) and left (solid color bar) nerves. RR < 1 would mean a safer outcome for 11 C plans as compared to 12 C plans. Despite the RR being > 1 for three patients in nominal scenarios,  www.nature.com/scientificreports/ in the worst-case scenarios the use of 11 C with decreased margins might reduce the relative risk down to less than 0.2 in most cases.
Parallel organs: liver. To estimate liver toxicity using the LKB model, we have used six models with different endpoints. The comparison of outcomes from 12 C versus 11 C treatment plans is given in Fig. 5. Apart from variations in the severity of the endpoints (from the liver failure (Fig. 5a) and the radiation-induced liver injury (Fig. 5b) to the potentially slightly less severe toxicities such as changes in albumin-bilirubin (ALBI) score, Child-Pugh (CP) score, or enzymatic changes ( Fig. 5c-f), the models also use different parameters as reported in Table 2. Of note, the NTCP predictions of the Burman model are quite varying from patient to patient and might reach quite high values; we attribute it to the low value of the volume effect parameter which increases the weight of the high-dose regions.
Wilcoxon Signed-Rank test reveals a significant difference (P < 0.01) between the NTCP values for 11 C and 12 C plans predicted by all six models. This suggests that 11 C plans with reduced range margins might be a better option also in terms of sparing the parallel critical structures.
Similar to the ACC plan analysis, relative risks were calculated for an individual patient for every model and are presented in Fig. 6 (top -nominal scenarios, bottom -worst-case scenarios). Depending on the model selected for comparison, 11 C plans would be either safer (Burman or Dawson models) than 12 C ones, or similar (El Naqua and Pursley models) in terms of relative risks. 11 C plans are most beneficial according to the Dawson C NTCP values for individual patients for the right (dashed bar) and left (solid color bars) optic nerves. RR < 1 means that the 11 C plan is beneficial compared to 12

Discussion
The results presented here show the benefits of the elimination of implicit margins, associated with range uncertainties in robustly optimised treatment plans for two different tumor types, that could be potentially achieved on a clinical level by using 11 C beams in combination with the improved PET range verification. In the previous overview of the BARB project 18 , we have already presented a FLUKA comparison of the activity maps of 12 C and 11 C ion treatment plans in the simple geometry (sphere in a water tank). In the first part of the current work, we demonstrated that the annihilation map in the patient, irradiated with 11 C beams, can theoretically be used to precisely determine the positions of the SOBP fall-offs and thus to recognize the shifts in actual dose distributions of about 1 mm. This would mean that radioactive beams, in particular, 11 C studied here, would be an ideal tool for detecting the range changes during patient irradiation, allowing one to apply the necessary plan corrections online or adjust the beam delivery accordingly (or indicating the need of a repeated CT scan in case of large or inconsistent shifts). In turn, the margins to account for the range uncertainty, that are added to the target volume (either explicitly in PTV-based optimization, or implicitly estimated from the value of range uncertainty passed to the system during the robust optimization) can be reduced, or, in theory, eliminated. While the simulations presented here reflect the best possible scenario, where the whole annihilation signal could be used to generate PET images reflecting the shifts in the dose distributions caused by the range uncertainties, and a perfect scanner resolution, the practical feasibility of range margin complete elimination remains an open question. For example, the signal coming from 11 C beams with a half-life of about 20.33 min still might be affected by the washout that might slightly smear the distribution; from this point of view, 10 C beams might be more practical. Moreover, even if the look-up tables for the Hounsfield units would correctly reflect the ion's pathlengths, the precision of the Bragg peak position prediction in the patient would be limited by the CT resolution. Additionally, efficiency of the imaging detectors, the capability of the online image reconstruction software tools and the limited time used to accumulate the signal further limit the best possible achievable image  It is important to point out that collecting the annihilation signal from the full plan delivered to the patient will be impractical if one aims at the online range verification since in this case the corrections can be applied only in the following fraction. The best approach to acquire enough signal in the shortest timespan possible to make any necessary plan modifications is yet to be identified. One option could be irradiating the whole target area but only at a small fraction of the target dose. Alternatively, a 'pilot shot' irradiating only a selected tumor slice with single energy would be faster since there is no time needed to switch between the different energies; however, there will be an increased risk of missing anatomical changes at higher depths. There is a need of a future in-depth Monte Carlo study to estimate how the range monitoring efficiency will be affected by the uncertainties in positron detection, coupled with the biological washout and the fact that only the limited fraction of the irradiation plan should be delivered for the initial online range detection. This would help propose a new more realistic value of range uncertainty to be used in robust treatment planning with 11 C.
Reduced range margins would make treatment planning procedures easier by increasing the chances of meeting the constraints set for the critical structures without compromising the desired target dose coverage. As the ACC plan analysis shows, with the standard margins, more than half of the plans were not acceptable in terms of exceeding the recommended maximal dose to the optic nerves in most of the treatment scenarios, while in the case of reduced margins the number of such 'problematic' plans is significantly reduced.
NTCP calculations for the optic nerves suggest the evident benefit of 11 C plans with reduced margins for serial organs in target proximity. Apart from the reduction of the median value of NTCP estimated for each patient plan, the spread between the best and the worst-case scenarios was significantly reduced compared to 12 C plans. Since the toxicity induction for the serial organ is rather the question of exceeding the certain dose in a point region, the range margin reduction, shrinking the high-dose region already by a millimeter, would play a crucial role in the organ sparing. Thus, the closer the serial OAR to the target structure, the more pronounced toxicity reduction should be expected. This agrees with the study 6 aimed at estimating the NTCP changes following the reduction of the range margins in proton therapy. To note, margin reduction in carbon therapy might have a stronger impact on the toxicity outcomes compared to protons, due to the 2-threefold increase of RBE at the distal edge of the SOBP.
Concerning liver tumors, it is difficult to correlate the decrease of NTCP with some specifics of patient geometry and/or field configuration. While all the NTCP models applied here indeed predict the highest toxicities for the patients with bigger tumors and higher ratios of tumor volume to a healthy liver, to our surprise, the most decreased relative risks were estimated for the patients with smaller tumors.
The less pronounced effect of margin reduction for the liver plans compared to the ACC plans might be at least partially explained by the healthy liver volume definition. Treatment plans were optimized for a uniform CTV dose, whereas the healthy liver was defined as the total liver minus GTV, following standard recommendations 21 . That would mean that for both 11 C and 12 C plans a small fraction of the healthy liver (CTV -GTV) would always www.nature.com/scientificreports/ receive the target dose and thus the margin reduction would not have any pronounced impact on the maximal dose value received by the organ. It is also important to note here, that when comparing liver plans in terms of the NTCP, attention needs to be paid to the differences in the models used. For example, in certain uncertainty scenarios for some patients, the Burman model for liver failure might predict the NTCP values higher than 60%, even though the treatment planning constraints were met. This might be attributed to the low value of the n parameter used in the respective LKB model which de-emphasizes the volumetric effects in the parallel organ and treats it more like a serial one. On the other hand, the Dawson model for the RILI predicts the NTCP values which are practically zero for all the patients, which might make the benefit of the five-time NTCP decrease with 11 C beams questionable. Moreover, the models used here consider different endpoints, among which some of them might be considered more severe than others (e.g., liver failure vs. enzymatic changes). That is why we think that including all these models in the analysis would give a more complete picture. All in all, for the liver plans and NTCP models studied here, 11 C plans reduce liver toxicity following the statistical analysis with the Wilcoxson test.
To further investigate which tumors will benefit the most from the margin reduction with RIBs, especially those close to the parallel organs, broader patient sets with different tumor sizes and their relative locations need to be analyzed. From the same point of view, considering other parallel organs, e.g., the kidney, would be interesting.
Reduction of target margins that are associated with range uncertainties, enabled by the use of 11 C beams, apart from reducing the doses to the adjacent structures by default, might further improve the flexibility and quality of treatment by enabling irradiation from different angles. A recent study 22 has shown further brainstem NTCP decrease from using novel proton beam arrangements, enabled by the reduced range margins. As the authors have shown, new beam arrangements might also enable the possibility of redistributing the LET distributions in the target and increasing its mean value. Concerning the 11 C beams, the redistribution of the stronger LET gradients might be of further benefit for both sparing the OARs and tackling more radioresistant, e.g., hypoxic target regions [23][24][25] .
In the present study, we did not account for target motion that would exacerbate the range uncertainty issue. For ACC tumors, this is a fair assumption. Instead, liver patients studied here were originally receiving SBRT with abdominal compression -for the ion fields with sharp dose fall-offs, this mitigation technique would not be sufficient. Future studies should be thus extended to the 4D robust optimization to include additional motioninduced range changes.

Conclusion
Radioactive ion beams, such as 11 C, can be ideal candidates to tackle the problem of range uncertainty in particle radiotherapy thanks to their sharp activity distribution that can be monitored using PET. In contrast to the PET signal produced by 12 C ions, used nowadays in clinical practice, the signal from 11 C would need a shorter acquisition time, and have a higher peak-to-noise ratio and a well-defined peak that can be better correlated with the dose distribution. All of this would allow in vivo beam monitoring in the patient. Improved precision of range monitoring would lead to smaller or eliminated range margins set for the target, which, in turn, would decrease the volume of normal tissue receiving unnecessary high doses of radiation. We have shown here that it is expected to have a significant positive impact on both the treatment planning outcomes (i.e., meeting treatment planning constraints without compromising the target dose coverage) and the predicted post-treatment toxicities (reduction of NTCP for various endpoints) both serial and parallel organs at risk in target proximity.

Materials and methods
Patient data. For this study, we have retrospectively considered consecutive patients with ACC and liver tumors to evaluate the impact of margin reduction on the damage to the serial (optic nerves) and parallel (liver) organs, respectively.
ACC patients (N = 15) analysed here had tumors located mainly in the frontal areas of the skull and were originally treated in 2002-2008 at GSI in frames of the pilot project, implying a 12 C ion boost following a course of intensity-modulated radiotherapy in Heidelberg 26,27 . The study was approved by the ethical committee of the University of Heidelberg back in 1997. Anonymized treatment plans of all patients are stored for research purposes at the GSI Helmholtzzentrum für Schwerionenforschung in Darmstadt, where treatment was actually delivered, according to the German law. Informed consent is waived by the ethical committee of the University of Heidelberg for anonymized plans used for research plans.
The liver patients (N = 10) studied here have originally undergone photon stereotactic body radiation therapy (SBRT) with abdominal compression at the Department of Radiotherapy of the University Federico II of Naples. Patients treatment at the University Federico II is performed under approval of the ethical committee of the University, specifically IRB 222-10. The approval includes use of anonymized treatment plans for research purposes only.
Treatment planning considerations. GSI in-house treatment planning system (TPS) TRiP98 for active beam scanning was used to perform biologically weighted robust treatment plan optimization 28,29 on a CTV using 12 C and 11 C beams for the plans comparison study. While in a PTV-based optimization the margins are explicitly added to the tumor contours, in the robust optimization in TRiP98 the planning input are the expected values of setup and range uncertainties, and the target margins are generated implicitly during the plan optimization. The initial 3D raster scanner grid, estimated from the CTV contours, gets extended to contain all the pencil beam spots resulting from the interplays of possible range over-and underestimations and setup shifts in all anatomical directions (in total there are 21 uncertainty scenarios). www.nature.com/scientificreports/ In our study, the treatment planning uncertainties included ± 3.5% ( 12 C) or 0% ( 11 C) range uncertainty and ± 3 mm shifts in all cardinal directions to consider positioning uncertainties.
The magnitudes of uncertainties for 12 C were preserved from a previous work on robust optimization 29 (where they were adapted from the proton study 30 ) for both sets of patient plans, even though for the liver with more homogeneous structure the range uncertainties might be slightly too conservative 31 . Target motion was not considered. The planning objective was to achieve at least 95% of the prescribed dose in at least 95% of the CTV volume in at least 95% of treatment scenarios, obeying the OAR constraints listed below. If a compromise could not be achieved, target coverage was favoured over OAR sparing.
ACC . We have preserved the original two-field configuration as well as the OAR contours from the original treatment plans from the pilot project, adjusting them to a 20 × 3 Gy(RBE) fractionation scheme. The planning constraint for the optic nerves was estimated following the European Particle Therapy Network (EPTN) recommendations 32 to D 0.03cm3 < 2.46 Gy(RBE) per fraction. The α/β ratio of both tumor and normal tissues was assumed to be 2 Gy 32,33 .
Liver tumors. Depending on the tumor location relative to the OARs, a single-or double-field configuration was chosen for each tumor. Where possible, a fractionation scheme of 4 × 10 Gy(RBE) was used, otherwise, it was changed to 6 × 7 Gy to reduce the dose to the serial OAR in target proximity. Planning constraints were adapted to the fractionation schemes following the recommendations of the American Association of Physicists in Medicine group (AAPM) 34 . The α/β ratio of the tumor was assumed to be 10 Gy 35,36 , and 2 Gy for the healthy liver 37,38 .
The details for each treatment plan are given in Table 1.
Base data. Apart from the patient-specific input (CT scans and RT structure contours, prescribed doses, and field configuration), TRiP98 requires a valid physical beam model describing the interaction of an ion with water, the reference medium in radiotherapy. The so-called base data includes laterally integrated depth-dose distributions in water and energy-dependent fragment spectra, necessary for the calculation of biological effects, at representative depths in water 39 . For this study, a new set of base data for 11 C, which were not previously used in TRiP98, was generated with the FLUKA Monte Carlo code (version 2020.1.10) 40 , which was also used for calculating annihilation maps Table 1. Adenoid cystic carcinoma (ACC) and liver patient plans characteristics summarizing the numbers and sizes of clinical tumor volumes, fractionation schemes (number and dose per fraction), numbers of fields, and organs at risk considered during the plan optimization. www.nature.com/scientificreports/ (see next section). To keep consistency between the two programs (FLUKA and TRiP98), energy loss tables for the primaries and all the fragments were extracted from FLUKA and used in TRiP98 together with the new set of base data. Additionally, still for consistency, a new set of base data for the 12 C was also specifically generated for this work. For both ion beams the FLUKA simulations were performed using the recommended default setting for ion beam therapy applications (cf. defaults 'HADROTHErapy'). For both isotopes, the beam spot was assumed to be Gaussian with a full width at half maximum (FWHM) of 8 mm in the isocenter. The Bragg peak broadening due to the use of a ripple filter was considered by the introduction of a momentum spread (dp/p) of 1% for all energies. Energy dependent fragment spectra and laterally integrated depth dose distributions were calculated in the energy range between 80 and 440 MeV/u, for beams impinging directly from vacuum to water, as in TRiP98 the energy degradation in the nozzle is considered through an adjustable offset value. A more in-depth description of the Monte Carlo calculation of base data for TPS can be found in 41 .

Monte Carlo simulation of annihilation maps for range shifts detection.
To prove whether the information from the 11 C β + -decays can be at least theoretically used to precisely detect shifts between the planned and actual dose distributions, we have performed an additional TRiP98-FLUKA study in a patient geometry.
As a first step, we produced a treatment plan for a patient using the look-up tables for converting the HU into the ion's WEPL (Hounsfield look-up table, or HLUT) extracted from FLUKA. To simplify the output and its interpretation, here the robust optimization was omitted; the plan was optimized for a uniform RBE-weighted dose of 3 Gy delivered with a single field. Then, the optimization procedure was repeated using the same HLUT scaled either by 1% or by 3.5% to emulate the range uncertainties associated with CT interpretation. As a result, three different raster scanner files containing information on energies and numbers of particles were produced. If loaded back into the system for the dose calculation with the default HLUT, only the first file would yield the dose distribution matching the target contours, while the other two would cause partial underdose and hotspots in the nearby healthy tissue.
In the second step, we loaded the patient CT and the raster scanner files in FLUKA to calculate the physical (absorbed) dose and annihilation distributions (as a best possible scenario the integral annihilation signal was scored). In this case the FLUKA built-in material description and HU-WEPL conversion were used for all three scenarios.
A specifically developed 'source.f ' user routine, similar to the one described in 41 was used to read the TRiP98 raster scanner files, specifying the sequence of the beam delivery, in FLUKA. This routine allows the possibility of full Monte Carlo simulation of TPS-optimized plans while using the same beam settings adopted for the base data calculations. In addition to the calculation of the annihilation maps in a realistic treatment planning scenario, this routine was used for consistency check between the MC calculations and the expected dose deposition distributions obtained with TRiP98.
As a result, we have produced three sets of dose and annihilation maps that were visualized and analyzed.
NTCP evaluation. All dose maps produced following ACC and liver patient plans optimization were first converted from Voxelplan (native TRiP98 format) into DICOM RT format using 3D Slicer 42 and then into Matlab-readable format (MathWorks, Natick, MA, USA) for further analysis. Next, each dose map has been voxelwise converted into 2-Gy equivalent dose (EQD2) by an in-house developed function for Matlab 43,44 . For the ocular structures in the ACC plans, the linear-quadratic model 45 was used with α/β = 2 Gy 33 . Instead, for the liver plans where high doses per fraction were planned (> 6 Gy per fraction), the EQD2 conversion was performed under linear-quadratic-linear condition 46 , using the parameters α/β = 2.5 Gy, γ/α = 5 Gy, d t = 5 Gy as reported in 47,48 . NTCP values were estimated using the Lyman-Kutcher-Burman (LKB) model 49,50 with organ-and toxicityspecific parameters from the models reported in literature. For the optic nerves, only one model for the optic neuropathy endpoint was applied. Instead, for the healthy liver (liver minus GTV), several models related to endpoints of different severities exist and were included in the analysis. The models and their key parameters are summarized in Table 2. Table 2. Summary of the LKB NTCP model parameters used in the study. TD 50 is the value of the uniform dose given to the entire organ surface corresponding to the 50% probability to induce toxicity; m is inversely proportional to the slope of the dose-response curve; and n (or a) accounts for the volume effect. www.nature.com/scientificreports/ As pointed out by 48,52,53 , the mean total (multiplied by the number of fractions) dose D mean , as well as the V5 and V30 representing the fractions of the healthy liver receiving the total dose of more than 5 or 30 Gy, respectively, can be used as predictive parameters for estimating the hepatic toxicity.

Statistical analysis.
Wilcoxon signed rank test with a threshold significance level of 0.05 was used to study the differences between the 12 C and 11 C treatment planning and NTCP modelling outcomes and was performed with GraphPad Prism Software V9.4.0. The test results are represented in the respective plots using asterisks as follows: ****P < 0.0001-extremely significant; 0.0001 < ***P < 0.001-extremely significant; 0.001 < **P < 0.01very significant; 0.01 < *P < 0.05 -significant, P ≥ 0.05-not significant (ns).

Data availability
The datasets used and/or analysed during the current study available from the corresponding author on reasonable request.